{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Object Detection with Ultralytics' YOLO\n",
    "\n",
    "This tutorial demonstrates how to pretrain a YOLO model using LightlyTrain and then fine-tune it for object detection using the `ultralytics` framework. To this end, we will first pretrain on a [25k image subset](https://github.com/giddyyupp/coco-minitrain) of the [COCO dataset](https://cocodataset.org/#home) (only the images, no labels!), and subsequently finetune on the labeled [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/).\n",
    "\n",
    "## Install Dependencies\n",
    "\n",
    "Install the required packages:\n",
    "\n",
    "- `lightly-train` for pretraining, with support for `ultralytics`' YOLO models\n",
    "- [`supervision`](https://github.com/roboflow/supervision) to visualize some of the annotated pictures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"lightly-train[ultralytics]\" \"supervision==0.25.1\" ipywidgets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "> **Important**: LightlyTrain is officially supported on\n",
    "> - Linux: CPU or CUDA\n",
    "> - MacOS: CPU only\n",
    "> - Windows (experimental): CPU or CUDA\n",
    "> \n",
    "> We are planning to support MPS for MacOS.\n",
    "> \n",
    "> Check the [installation instructions](https://docs.lightly.ai/train/stable/installation.html) for more details on installation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Pretraining on COCO-minitrain\n",
    "We can download the COCO-minitrain dataset (25k images) directly from HuggingFace..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://huggingface.co/datasets/bryanbocao/coco_minitrain/resolve/main/coco_minitrain_25k.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "... unzip it..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "!unzip coco_minitrain_25k.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": [
    "... and since Lightly**Train** does not require any labels, we can also confidently delete all the labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -rf coco_minitrain_25k/labels"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "With the dataset ready, we can now start the pretraining. Pretraining with Lightly**Train** could not be easier, you just pass the following parameters: \n",
    " - `out`: you simply state where you want your logs and exported model to go to\n",
    " - `model`: the model that you want to train, e.g. `yolo11s` from Ultralytics\n",
    " - `data`: the path to a folder with images\n",
    "\n",
    " Your data is simply assumed to be an arbitrarily nested folder; LightlyTrain with find all images on its own and since there are no labels required there is no danger of ever using false labels! 🕵️‍♂️\n",
    "\n",
    "It is highly recommended to run this pretraining on a GPU, expect about 60min of training time on Colab's free version!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "import lightly_train\n",
    "\n",
    "lightly_train.train(\n",
    "    out=\"out/coco_minitrain_pretrain\",  # Output directory.\n",
    "    model=\"ultralytics/yolo11s.yaml\",  # Pass the YOLO model (use .yaml ending to start with random weights).\n",
    "    data=\"coco_minitrain_25k/images\",  # Path to a directory with training images.\n",
    "    overwrite=True,  # A overwriting so that this cell can be re-run.\n",
    "    epochs=30,  # Number of training epochs.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "And just like that you pretrained a YOLO11s backbone! 🥳 This backbone can't solve any task yet, so in the next step we will finetune it for object detection on the PASCAL VOC dataset.\n",
    "\n",
    "## Finetuning on PASCAL VOC\n",
    "Now that the pretrained model has been exported, we will further fine-tune the model on the task of object detection. The exported model already has exactly the format that Ultralytics' YOLO expects, so after getting the dataset ready, we can get started with only a few lines! ⚡️ \n",
    "\n",
    "In addition to fine-tuning the pretrained model we will also train a model that we initialize with random weights. This will let us compare the performance between the two, and show the great benefits of pretraining.\n",
    "\n",
    "Expect again a run-time of around 1h each, for fine-tuning from the pretrained model, as well as fine-tuning from randomly initialized weights.\n",
    "\n",
    "### Download the PASCAL VOC Dataset\n",
    "We can download the dataset directly using Ultralytics' API with the `check_det_dataset` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ultralytics.data.utils import check_det_dataset\n",
    "\n",
    "dataset = check_det_dataset(\"VOC.yaml\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "Ultralytics always downloads your datasets to a fixed location, which you can fetch via their `settings` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ultralytics import settings\n",
    "\n",
    "print(settings[\"datasets_dir\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "Inside that directory (<DATASET-DIR>), you will now have the following structure of images and labels:\n",
    "\n",
    "```bash\n",
    "tree -d <DATASET-DIR>/VOC -I VOCdevkit\n",
    "\n",
    ">    datasets/VOC\n",
    ">    ├── images\n",
    ">    │   ├── test2007\n",
    ">    │   ├── train2007\n",
    ">    │   ├── train2012\n",
    ">    │   ├── val2007\n",
    ">    │   └── val2012\n",
    ">    └── labels\n",
    ">        ├── test2007\n",
    ">        ├── train2007\n",
    ">        ├── train2012\n",
    ">        ├── val2007\n",
    ">        └── val2012\n",
    "```\n",
    "\n",
    "### Inspect a few Images\n",
    "\n",
    "Let's use `supervision` and look at a few of the annotated samples to get a feeling of what the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import supervision as sv\n",
    "import yaml\n",
    "from ultralytics import settings\n",
    "from ultralytics.data.utils import check_det_dataset\n",
    "\n",
    "dataset = check_det_dataset(\"VOC.yaml\")\n",
    "\n",
    "detections = sv.DetectionDataset.from_yolo(\n",
    "    data_yaml_path=dataset[\"yaml_file\"],\n",
    "    images_directory_path=f\"{settings['datasets_dir']}/VOC/images/train2012\",\n",
    "    annotations_directory_path=f\"{settings['datasets_dir']}/VOC/labels/train2012\",\n",
    ")\n",
    "\n",
    "with open(dataset[\"yaml_file\"], \"r\") as f:\n",
    "    data = yaml.safe_load(f)\n",
    "\n",
    "names = data[\"names\"]\n",
    "\n",
    "box_annotator = sv.BoxAnnotator()\n",
    "label_annotator = sv.LabelAnnotator()\n",
    "\n",
    "fig, ax = plt.subplots(2, 2, figsize=(10, 10))\n",
    "ax = ax.flatten()\n",
    "\n",
    "detections = [detections[random.randint(0, len(detections))] for _ in range(4)]\n",
    "\n",
    "for i, (path, image, annotation) in enumerate(detections):\n",
    "    annotated_image = box_annotator.annotate(scene=image, detections=annotation)\n",
    "    annotated_image = label_annotator.annotate(\n",
    "        scene=annotated_image,\n",
    "        detections=annotation,\n",
    "        labels=[names[elem] for elem in annotation.class_id],\n",
    "    )\n",
    "    ax[i].imshow(annotated_image[..., ::-1])\n",
    "    ax[i].axis(\"off\")\n",
    "\n",
    "fig.tight_layout()\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "### Finetuning the Pretrained Model\n",
    "All we have to do is to pass the path to the pretrained model to the `YOLO` class and the rest is the same as always with Ultralytics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ultralytics import YOLO\n",
    "\n",
    "# Load the exported model.\n",
    "model = YOLO(\"out/coco_minitrain_pretrain/exported_models/exported_last.pt\")\n",
    "\n",
    "# Fine-tune with ultralytics.\n",
    "model.train(\n",
    "    data=\"VOC.yaml\", epochs=10, project=\"logs/voc_yolo11s\", name=\"from_pretrained\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "### Finetuning from Random Weights\n",
    "In order to quantify the influence of our pretraining, we also train a model from random weights, in Ultralytics this follows the `.yaml` name convention."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ultralytics import YOLO\n",
    "\n",
    "# Load the exported model.\n",
    "model = YOLO(\"yolo11s.yaml\")\n",
    "\n",
    "# Fine-tune with ultralytics.\n",
    "model.train(data=\"VOC.yaml\", epochs=10, project=\"logs/voc_yolo11s\", name=\"from_scratch\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21",
   "metadata": {},
   "source": [
    "## Evaluating the Model Performance\n",
    "Congratulations, you made it almost to the end! 🎉 The last thing we'll do is to analyze the performance between the two. A very common metric to measure the performance of object detectors is the `mAP50-95` which we plot in the next cell, for both the pretrained model and the model that we trained from scratch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "\n",
    "res_scratch = pd.read_csv(\"logs/voc_yolo11s/from_scratch/results.csv\")\n",
    "res_finetune = pd.read_csv(\"logs/voc_yolo11s/from_pretrained/results.csv\")\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(res_scratch[\"epoch\"], res_scratch[\"metrics/mAP50-95(B)\"], label=\"scratch\")\n",
    "ax.plot(res_finetune[\"epoch\"], res_finetune[\"metrics/mAP50-95(B)\"], label=\"finetune\")\n",
    "ax.set_xlabel(\"Epoch\")\n",
    "ax.set_ylabel(\"mAP50-95\")\n",
    "max_pretrained = res_finetune[\"metrics/mAP50-95(B)\"].max()\n",
    "max_scratch = res_scratch[\"metrics/mAP50-95(B)\"].max()\n",
    "ax.set_title(\n",
    "    f\"Pretraining is {(max_pretrained - max_scratch) / max_scratch * 100:.2f}% better than scratch\"\n",
    ")\n",
    "ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    " - Go beyond the default distillation pretraining and experiment other pretraining learning methods in LightlyTrain. Check [Methods](https://docs.lightly.ai/train/stable/methods/index.html#methods) for more information.\n",
    " - Try various YOLO models (`YOLOv5`, `YOLOv6`, `YOLOv8`).\n",
    " - Use the pretrained model for other tasks, like [image embeddings](https://docs.lightly.ai/train/stable/embed.html#embed)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
